Exploiting Belief Locality in Run-Time Decision-Theoretic Planners
نویسندگان
چکیده
While Partially-Observable Markov Decision Processes have become a popular means of representing realistic planning problems, exact approaches to finding POMDP policies are extremely computationally complex. An alternative approach for control in POMDP domains is to use run-time optimization over action sequences in a dynamic decision network. While exact algorithms have to generate a policy over the entire belief space, a run-time approach only needs to reason about the reachable parts of the belief space. By combining runtime planning with caching, it is possible to exploit locality of belief states in POMDPS, allowing for significant speedups in plan generation. This paper introduces and evaluates an exact caching mechanism and a grid-based caching mechanism.
منابع مشابه
Using Loops in Decision-Theoretic Refinement Planners
Classical AI planners use loops over subgoals to move a stack of blocks by repeatedly moving the top block. Probabilistic planners and reactive systems repeatedly try to pick up a block to increase the probability of success in an uncertain environment. These planners terminate a loop only when the goal is achieved or when the probability of success has reached some threshold. The tradeoff betw...
متن کاملExploiting Locality in Probabilistic Inference
Exploiting Locality in Probabilistic Inference by Mark Andrew Paskin Doctor of Philosophy in Computer Science University of California, Berkeley Professor Stuart J. Russell, Chair This thesis investigates computational properties of decomposable probability models, which can be represented in terms of marginals over subsets of variables. We show that decomposable representations have significan...
متن کاملSearch Control of Plan Generation in Decision-Theoretic Planners
This paper addresses the search control problem of selecting which plan to refine next for decision-theoretic planners, a choice point common to the decision theoretic planners created to date. Such planners can make use of a utility function to calculate bounds on the expected utility of an abstract plan. Three strategies for using these bounds to select the next plan to refine have been propo...
متن کاملDirecting a Portfolio with Learning
Algorithm portfolios are one approach designed to harness algorithm bias to increase robustness across a range of problems. We conjecture that learning based on the previous performance of a suite of planning systems can direct a portfolio strategy at each of three stages: selecting which planners to run; ranking the order to run the selected planners; and allocating computational time to them....
متن کاملBelief Change Based on Global Minimisation
A general framework for minimisation-based belief change is presented. A problem instance is made up of an undirected graph, where a formula is associated with each vertex. For example, vertices may represent spatial locations, points in time, or some other notion of locality. Information is shared between vertices via a process of minimisation over the graph. We give equivalent semantic and sy...
متن کامل